{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfGpInivS0fG"
   },
   "source": [
    "<h2 align=\"center\">点击下列图标在线运行HanLP</h2>\n",
    "<div align=\"center\">\n",
    "\t<a href=\"https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_stl.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\t<a href=\"https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fdep_stl.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" alt=\"Open In Binder\"/></a>\n",
    "</div>\n",
    "\n",
    "## 安装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IYwV-UkNNzFp"
   },
   "source": [
    "无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Uf_u7ddMhUt"
   },
   "outputs": [],
   "source": [
    "!pip install hanlp -U"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pp-1KqEOOJ4t"
   },
   "source": [
    "## 加载模型\n",
    "HanLP的工作流程是先加载模型，模型的标示符存储在`hanlp.pretrained`这个包中，按照NLP任务归类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "4M7ka0K5OMWU",
    "outputId": "69cdad22-d94d-41fb-9591-1c29515a3da9"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'CTB5_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb5_20191229_025833.zip',\n",
       " 'CTB7_BIAFFINE_DEP_ZH': 'https://file.hankcs.com/hanlp/dep/biaffine_ctb7_20200109_022431.zip',\n",
       " 'CTB9_DEP_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/ctb9_dep_electra_small_20220216_100306.zip',\n",
       " 'PMT1_DEP_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/pmt_dep_electra_small_20220218_134518.zip',\n",
       " 'CTB9_UDC_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/dep/udc_dep_electra_small_20220218_095452.zip',\n",
       " 'PTB_BIAFFINE_DEP_EN': 'https://file.hankcs.com/hanlp/dep/ptb_dep_biaffine_20200101_174624.zip'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import hanlp\n",
    "hanlp.pretrained.dep.ALL # 语种见名称最后一个字段或相应语料库"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "BMW528wGNulM"
   },
   "source": [
    "调用`hanlp.load`进行加载，模型会自动下载到本地缓存："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "id": "0tmKBu7sNAXX"
   },
   "outputs": [],
   "source": [
    "dep = hanlp.load(hanlp.pretrained.dep.CTB9_DEP_ELECTRA_SMALL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "elA_UyssOut_"
   },
   "source": [
    "## 依存句法分析\n",
    "依存句法分析任务的输入为已分词的一个或多个句子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "id": "BqEmDMGGOtk3"
   },
   "outputs": [],
   "source": [
    "tree = dep([\"2021年\", \"HanLPv2.1\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jj1Jk-2sPHYx"
   },
   "source": [
    "返回对象为[CoNLLSentence](https://hanlp.hankcs.com/docs/api/common/conll.html#hanlp_common.conll.CoNLLSentence)类型："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "U_PGm06m6K20",
    "outputId": "a25c6452-5032-42b3-d501-99158380c487"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'id': 1,\n",
       "  'form': '2021年',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 3,\n",
       "  'deprel': 'tmod',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 2,\n",
       "  'form': 'HanLPv2.1',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 3,\n",
       "  'deprel': 'nsubj',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 3,\n",
       "  'form': '带来',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 0,\n",
       "  'deprel': 'root',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 4,\n",
       "  'form': '次',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 5,\n",
       "  'deprel': 'det',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 5,\n",
       "  'form': '世代',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 7,\n",
       "  'deprel': 'dep',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 6,\n",
       "  'form': '最',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 7,\n",
       "  'deprel': 'advmod',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 7,\n",
       "  'form': '先进',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 12,\n",
       "  'deprel': 'rcmod',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 8,\n",
       "  'form': '的',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 7,\n",
       "  'deprel': 'cpm',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 9,\n",
       "  'form': '多',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 10,\n",
       "  'deprel': 'nummod',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 10,\n",
       "  'form': '语种',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 12,\n",
       "  'deprel': 'nn',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 11,\n",
       "  'form': 'NLP',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 12,\n",
       "  'deprel': 'nn',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 12,\n",
       "  'form': '技术',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 3,\n",
       "  'deprel': 'dobj',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None},\n",
       " {'id': 13,\n",
       "  'form': '。',\n",
       "  'cpos': None,\n",
       "  'pos': None,\n",
       "  'head': 3,\n",
       "  'deprel': 'punct',\n",
       "  'lemma': None,\n",
       "  'feats': None,\n",
       "  'phead': None,\n",
       "  'pdeprel': None}]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tree"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Gn_RQa_Z6K20"
   },
   "source": [
    "打印时为CoNLL格式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "26P1LGzv6K20",
    "outputId": "c78ffdb0-3cd7-492d-f55e-0d50120faffb"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\t2021年\t_\t_\t_\t_\t3\ttmod\t_\t_\n",
      "2\tHanLPv2.1\t_\t_\t_\t_\t3\tnsubj\t_\t_\n",
      "3\t带来\t_\t_\t_\t_\t0\troot\t_\t_\n",
      "4\t次\t_\t_\t_\t_\t5\tdet\t_\t_\n",
      "5\t世代\t_\t_\t_\t_\t7\tdep\t_\t_\n",
      "6\t最\t_\t_\t_\t_\t7\tadvmod\t_\t_\n",
      "7\t先进\t_\t_\t_\t_\t12\trcmod\t_\t_\n",
      "8\t的\t_\t_\t_\t_\t7\tcpm\t_\t_\n",
      "9\t多\t_\t_\t_\t_\t10\tnummod\t_\t_\n",
      "10\t语种\t_\t_\t_\t_\t12\tnn\t_\t_\n",
      "11\tNLP\t_\t_\t_\t_\t12\tnn\t_\t_\n",
      "12\t技术\t_\t_\t_\t_\t3\tdobj\t_\t_\n",
      "13\t。\t_\t_\t_\t_\t3\tpunct\t_\t_\n"
     ]
    }
   ],
   "source": [
    "print(tree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果不需要CoNLL格式的话，也许`conll=False`时的输出更加简洁："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(3, 'tmod'),\n",
       " (3, 'nsubj'),\n",
       " (0, 'root'),\n",
       " (5, 'det'),\n",
       " (7, 'dep'),\n",
       " (7, 'advmod'),\n",
       " (12, 'rcmod'),\n",
       " (7, 'cpm'),\n",
       " (10, 'nummod'),\n",
       " (12, 'nn'),\n",
       " (12, 'nn'),\n",
       " (3, 'dobj'),\n",
       " (3, 'punct')]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dep([\"2021年\", \"HanLPv2.1\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"], conll=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 可视化\n",
    "你可以构造一个`Document`实现漂亮的可视化："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"display: table; line-height: 128%;\"><pre style=\"display: table-cell; font-family: SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace; white-space: nowrap;\">Dep&nbsp;Tree&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>─────────────&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┌──►&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│┌─►&nbsp;<br>┌┬───────┴┴──&nbsp;<br>││&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┌─►&nbsp;<br>││&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┌─►└──&nbsp;<br>││&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;┌─►&nbsp;<br>││&nbsp;&nbsp;┌─►└──┼──&nbsp;<br>││&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└─►&nbsp;<br>││&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;┌─►&nbsp;<br>││&nbsp;&nbsp;│&nbsp;&nbsp;┌─►└──&nbsp;<br>││&nbsp;&nbsp;│&nbsp;&nbsp;│&nbsp;&nbsp;┌─►&nbsp;<br>│└─►└──┴──┴──&nbsp;<br>└───────────►&nbsp;</pre><pre style=\"display: table-cell; font-family: SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace; white-space: nowrap;\">Token&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>─────────&nbsp;<br>2021年&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>HanLPv2.1&nbsp;<br>带来&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>次&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>世代&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>最&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>先进&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>的&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>多&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>语种&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>NLP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>技术&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>。&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</pre><pre style=\"display: table-cell; font-family: SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace; white-space: nowrap;\">Relati<br>──────<br>tmod&nbsp;&nbsp;<br>nsubj&nbsp;<br>root&nbsp;&nbsp;<br>det&nbsp;&nbsp;&nbsp;<br>dep&nbsp;&nbsp;&nbsp;<br>advmod<br>rcmod&nbsp;<br>cpm&nbsp;&nbsp;&nbsp;<br>nummod<br>nn&nbsp;&nbsp;&nbsp;&nbsp;<br>nn&nbsp;&nbsp;&nbsp;&nbsp;<br>dobj&nbsp;&nbsp;<br>punct&nbsp;</pre></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from hanlp_common.document import Document\n",
    "doc = Document(\n",
    "    tok=[\"2021年\", \"HanLPv2.1\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"],\n",
    "    dep=[(3, 'tmod'), (3, 'nsubj'), (0, 'root'), (5, 'det'), (7, 'dep'), (7, 'advmod'), (12, 'rcmod'), (7, 'cpm'), (10, 'nummod'), (12, 'nn'), (12, 'nn'), (3, 'dobj'), (3, 'punct')]\n",
    ")\n",
    "doc.pretty_print()"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "dep_stl.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
